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ABSTRACT 

The formative evaluation process described was 
designed to be used by Brigham Young University personnel in 1971 to 
evaluate and improve TICCIT programs and materials for community 
college credit courses. Involving the use of editorial judgment and 
data on student use, this process includes five levels of evaluation 
and revision: (0) materials' are reviewed for subject matter accuracy,* 
instructional psychology, and message design; (1) materials pass 
through several cycles of formal debug, using skilled and critical 
students to look at every display with different types of mental sets 
characteristic of students and catch problems in answer processing; 
(2) lessons and units are tested on 20 or more students enrolled in 
convenient institutions to identify those lessons, segments, and 
displays which produce confusion or difficulty; (3) courses, lessons, 
and units are tested on 20 or more community college students to 
discover any remaining areas causing difficulties; and (4) courses, 
units, and lessons are tested on several hundred community college 
students. Modifications indicated by the results in each step are 
part of the on -going process. The most advantageous role during this 
period for Educational Testing Service (ETS) , which would perform a 
separate summative evaluation of the final revised version, was seen 
as facilitator of interchange between the developers and the users 
and as a facilitator of the formative evaluation process. 
(Author /CMV) 
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. ^ /This paper is designed to communicate certain concepts regarding 
formative evaluation of TICCIT courseware to teachers and administrators 
who plan to use the TICCIT system. 

The terms "formative evaluation" and "summative evaluation' 1 were 
introduced and distinguished by Michael Scriven in 1967 (Scriven, 1967). 
Bloom (1956), has written a handbook for formative and summative evalua- 
tion for teachers which is extremely useful as a guide for teachers involved 
in small instructional development and evaluation projects. In general, the 
distinction useful to the TICCIT project is that formative evaluation, 
performed by Brigham Young University personnel, will exercise editorial 
judgment, and collect data on student use which can be used to improve the 
courseware until the final revised version is installed in the summer of 1975 
Summative evaluation, on the other hand, is something which Educational 
Testing Service (ETS) will perform independently, providing an overall eval- 
uation of the final system. The system will ba compared with its design 
goals and with other important instructional objectives. 
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The scientific method has been defined as "doing your damnedest 
with your mind with no holds barred 11 to push forward the purposes of 
science. A workinc definition of formativfe evaluation might be: formative 

Li 

evaluation is a process of doing your damnedest with human judgment and 
student date* to locate and improve deficiencies in content accuracy, 
instructional effectiveness, sensible and responsive decision logic, and , 

the organic unity and esthetics of instructional material. 

i. 

■0 

Since both human judgment and data are used to improve various 
aspects of the courseware, instead of just talking about a "formative 

0 

evaluation process, " it is useful to distinguish five levels of evaluation 

and revision through which courseware will pass. These five levels are 

as follows: h 

0; Lessons (and unit material) are reyX&wed and revised for subject 
matter accuracy and excellence, instructional psychology, and 
message design. Lessons are input and mechanics debug corrects 
"proofreading" level errors in the displays and logic. 

1) Lessons (and unit material) pass through several cycles of formal 
debug, using skilled and critical students who look at every dis- 
play with different types of mental sets characteristic of students. 
The majority of problems in answer processing will be caught and 
corrected by this step. Courseware. is now in a form acceptable 

to be used by students in credit classes, but these classes should 
be backed up by more teaching personnel than will be necessary 
later, for some percentage of the segments will fail to teach 
adequately. 

2) c Lessons and units will be tested on 20 or more students enrolled 
in convenient institutions (Utah Valley). Statistics will be used 
to identify those iess6ns,- segments, and displays which produce 
confusion or difficulty. These will be' revised. 

3) Courses, lessons, and units will Y' tested on 20 or more commu- 
nity .college students at PC and NVCC. Based on these data, courses, 
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units, lessons, and displays which produce confusion or 
difficulty will be modified. Aspects of the system design may 
be modified. Aspects of the implementation plan and faculty 
role models which cause problems or fall short will be modified. 

4) Courses, units, and lessons will be tested on several hundred 
community college students at PC and NVCC. The same actions 
as in stage 3 will be taken as indicated by the data. 

Table 1 relates the five levels above to the definition of formative 
evaluation. Recall that this definition dealt with the application of human 
judgment and student data to improvements in content, effectiveness, mes- 
sage design, decision logic and other matters. In Table 1 the five levels 
are listed as subheadings and there are two columns, one dealing with the 
appiicatiorf^of human judgment and the other dealing with the application of 
student data to the process of making revisions. You will note that 
revisions in content always rely on hurncn judgment, in this case the judg- 
ment of subject-matter experts. This judgment 'is applied first among the 
development group and later at the colleges as faculty members provide 
input on content in accuracies and questions. Instructional effectiveness 
may be addressed by the human judgment of an instructional psychologist 
at an early stage in development. Ultimately, however, effectiveness 
becomes a question which can only be answered on the basis of student 
data. Message design is mixed between judgment and data, as indicated 
by the horizontal brackets in Table 1; Human judgment is used through the 
manuscript level. Following that, student data are collected before any 
message design revisions are made. 
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Revising the decision logic is purely a function of student data. It 
is only through the experience of a number of students, working through 
the answer processing, instance files, and other complex parts of the 
instructional logic that we can find loops, blind alleys, incongruities, 
and other. difficulties. It is usually a fairly slow process requiring many, 
many students to take the material. Data from their interactions must be 

recorded and summarized before ail Uttie problems in decision logic can 

'i 

be ironed out. 

The effectiveness of instruction in segments and lessons is a matter 
which can be addressed by instructional psychologists at the manuscript 
level, but really can only be answered satisfactorily through the application 
of student data in levels 2, 3 , and 4. It is an axiom in instructional 
psychology that human judgment should be used exclusively only when data 
are lacking. It is probably true, that some psychologists have failed to use 
judgment at times, having become too cautious and distrustful of intuition, 
emotion, creativity , etc. Some of them may lose the broad prospective 
needed for good courseware development. A psychologist strictly from the 
laboratory tradition and without a certain feeling for art, style, and creativity 
is not always a good team member. We hope that we have been able to avoid 
this pitfall. 

The five evaluation levels planned for TICCIT courseware should be 
compared with the evaluation and revision methods used in existing modes 
of instruction. A typical node would be a lecture course, planned and 
executed by a single faculty member. Another would be the process of 
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developing a new textbook. These revealing comparisons are presented in 

********* 

Table 2. Table 2 is organized in three columns. In the first column we 
have^e^siJJbed each of the five computer^assisted instruction evaluation 

•••;> 

levels. IrTcolumn 2 we discuss the analogous procedure for the preparation 

of a lecture class, and in. column 3 we compare the procedure for a textbook. 

In viewing the analysis presented in Table 2 it can be seen that the 

quality control of level 1 TICCIT courseware is more than equivalent to a 

textbook or to a new lecture course introduced by a faculty member. The 

editorial review from the perspectives of subject matter, instructional 

psychology, and message design as well as the usual proofreading will 

make the courseware not only respectable, but roughly equal to the presently 

utilized "hard -copy " materials. The extent to which this individualized 

material will effectively teach different kinds of students is still an issue, 

but it cannot be resolved until data are collected in connection with steps 

2, 3, and finally step 4. Obviously, step 2, which uses BYU-students who 

are not drawn from the population of community college students, will be 

less accurate in identifying particular instructional weaknesses in lessons 

and segments than data collected at the colleges themselves. 

If the colleges feel that level 1 courseware would be suitable for 

administration to their students (as we feel it would be for our BYU students), 

then the only questions are: 

1) How to back the material up with support from teachers sufficient 
to assure that the students learn well? 

:wf- 2) How to maintain adequate control of data collection so that the 
data will be correctly interpreted? / 



The, first question can best be answered with data which give us 
some idea of the percentage of lessons which might be difficult or ambiguous 
If this number is relatively small (say less than 10 percent), it would seem 
that the risk would be small for going ahead within the colleges with several 
formative evaluation classes. If the number is larger (say 30 percent), then 
there may be a question as to whether it would be appropriate to do this . 
There is also a question whether we will have sufficient resources left to 
revise 30 percent or more of an already very extensive body of material. It 
is most unfortunate that it has been impossible to test sample lessons on 
the TICCIT system until this late in the project. A ^reat loss of time and 
money occurs because of the lack of information about how students will 
respond to the final product. An enormous amount of revision effort could 
be avoided by the ability to test developing lessons with students. 

If the colleges decide to conduct a level 3 evaluation on site starting 
September, 1974, then they may obtain benefits in terms of more effective 
courseware and in terms of faculty development. The process of working 
at either level 3 or level 4 in assisting in the collection of formative evalua- 
tion data would be a new experience for most teachers and could be a 
challenging and interesting experience. Never before has a community 
college had the ability to get such closely detailed data, scrutinizing each 
lesson, segment, and individual frame . Should teachers obtain some kind 
of professional development credit for working in this environment? Could 
this type of work lead to the creation of development expertise at the 
colleges? In other words, can developmental evaluation teach enough 

8 



about the various courseware components and how students respond to them 
to give faculty members an intuitive sense for the structure and function of 4 
learner controlled courseware? Would this enable them to learn rapidly 
how to develop it themselves? These are some questions which must be 
answered in developing a plan for implementation and testing next fail. 

The probability that MITRE and BYU will complete very little course- 
ware through level 2 before installation in colleges raises questions about 
the best role which Educational Testing Service should play. It does not 
seemto serve the best interests of the colleges, BYU, MITRE, or the field 
of computer-assisted instruction to view the year 1974-75 as a summative 
evaluation year. Because of schedula slips, the courseware will not have - 
a fair chance to go through the formative evaluation process for effective- 
ness of lessons and segments. Because of the expertise ETS has in test 
and instrument design, and because of the good association they have with 
both colleges, ETS might best function during the year V974-7 5 by. facilitat- 
ing the interchange between the developers and the users and in facilitating 
de formative evaluation process which will be going on that year. 



TABLE 1 

USE OF HUMAN JUDGMENT AND STUDENT DATA 
"IN FIVE STEPS ' OF FOR MATIVE EVALUATION 



fey- 



Human Judgment 



Student Data 



Level $ 


^ 

Content 

r 


Instructional 
Effectiveness 


Message Design 


Decision Logic 


Instructional 
Effectiveness 


0, Expert Reviewers 


Independent 

RpvIpw 


Instructional 
Review 


Message-Design 
Exnert 




i 


1 Bright Students 


Bright 
Student 


Bright 
Student 


X 


X . 


X 


4 

2 20-30 Students 
atBYU 


* 


* 


XX 


XX 


XX ' 


3 20-30 Students 
at College (s) • 


A Few College 
Faculty :' 




4 * XXX 


XXX 


XXX' 


4 Hundreds of 
. Students at 
College's) 


Numerous 
College , 
Faculty 




* XXXX 


XXXX 


XXXX 



X The number of X's indicate relative'importance of different sources of student data in indicating needed 
revisions, ' 

*. While human judgment is obviously not abandoned in these cases, where student data are available, human 

judv'^ent (wJiich "often reflects personal taste and style) must take full account of it before revisions are 
' <rie. ■* . / " 



TABLE 2 • 

A COMPARISON BETWEEN COMPUTER-ASSISTED-INSTRUCTION, LECTURES AND 
TEXTBOOKS IN REGARD : 'TO STAGES OF FORMATIVE EVALUATION AND REVISION 



cm : 


LECTURES 


; TEXTBOOK 


Level 0. Editorial 


A new set of lecture notes are 


Textbooks 'are independently. 


evaluation by subject mat- 


rarely reviewed by an indepen- 


reviewed by subject matter experts 


ter experts, instructional 


dent subject matter expert and 


for content, prose and layout. • 


psychologists and message 
design experts. Kevision 


never iby instructional 


Message design expertise is used 


psycnoiocjisis ana message 1 


only in u naiiuw benoe, conouainca 


of manuscript material '* 


design experts 1 . . Handouts 
prepared to accompany the 

9 ; • 

lecture may be proofread by ' 
a secretary. 


by traditional page layout formats. 


level 1. Three or four 


•Faculty members rarely, 'if ever, 


The manuscripts for textbooks are 0 


exDert students ao throuah 


taVp timp to snhipnt' fhpir 

LuJlw Lliuw LU OUJJJvUl LJICU 


on some occasions exnosed to some 


a detailed formal debug pro- 


lecture notes to three or four 


of the author's better students who . 


♦» 1 

cedure. Errors, especially 


critical students and revise. 


will read and make appropriate 


in logic and answer processing, 


them J accordingly before they 


comments. : Z. 


are corrected. 


first deliver the course. 


'i 



Table 2 Continued \. 



CM 

Level 2. Material which has 
• been debugged formally is 
exposed to 20 or 30 students, 
at the development site (BYU 
in this case). Data are col- 

i • 

lected and revisions are 

i 

made. 



LECTURES ' j 

I; 

Not applicable . because class 



TEXTBOOK 



An. author may. expose the manuscript 



lectures are rarely designed for material to classes of students and 



transportability. 



4 



test it informally at his. home institu- 
tion. There are, however, no formal^ 
procedures for collecting data and 

focusing these data on specific les- 
sons , segments and individual ' 

'frames- as will be, done by the TICCIT 

data reduction system. The possi- 

bility for collecting data this detailed 
is simply not available to the author 
of a manuscript. 



Level 3 . TICCIT courseware Usually , no data are collected, . Textbooks are rarely tested at other 

•is exposed: to classes of 20-30 but revisions are made based , . colleges, although some professors 

' students in sections at the col- on a teacher's subjective inter- ■ have colleagues at another university 

leges where it will be installed, pretation of students' reactions who are willing to try their manuscript 

Data are collected and sent back and complaints during the first ; in a class. Feedback for revision is 

,to 3YU for the' appropriate semester. ; He will usually make , quite informal and subjective . .. 

i 1 1 ( • i ■' • ■ ' ' * 1 1 

.revisions. • ^ revisions based on his anecdotal 

' q ■ Information, plus his own feelings. ■ . 



9 

ERLC 



■CM- 



Table 2 Continued 
LECTURES 



TEXTBOOK 



Level 4 . Based on data from Over a long period of use the Data are rarely collected , but users 



hundreds of students , 
developers revise lessons 
and segments that do not 
help enough students to 
succeed on -lesson tests. 



lecture notes for a given .. 
faculty member's course are 



may send back comments on typo- 
graphical errors and , other ma tters . 



revised based upon the teacher's ;. These are' corrected in succeeding' 

subjective interpretation of , editions of the book, The author 

students' reactions , He does "may revise the book in four or five 

y A ■ • ■ ; 

no formal data collection to , years , but he does this to correct 

focus his, revision efforts; this and update the content, rather than -. : 

is not a true formative evalua- to improve the instructional effec- 



tion. 



tiveness of the textbook. At least 



one Calculus text has been corrected 



by offering a $5 reward for each new 
error detected in the practice problem 



solutions. 
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